Generalized-ensemble simulations of spin systems and protein systems 
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ABSTRACT 

In complex systems such as spin systems and protein systems, conventional simula- 
tions in the canonical ensemble will get trapped in states of energy local minima. We 
employ the generalized-ensemble algorithms in order to overcome this multiple-minima 
problem. Three well-known generalized-ensemble algorithms, namely, multicanonical al- 
gorithm, simulated tempering, and replica-exchange method, are described. We then 
present three new generalized-ensemble algorithms based on the combinations of the three 
methods. Effectiveness of the new methods are tested with a Potts model and protein 
systems. 

1 INTRODUCTION 

The protein folding problem is one of the most challenging problems in computational 
biophysics. The difficulty comes from the fact that the number of possible conformations 
for each protein is astronomically large. Simulations by conventional methods such as 
Monte Carlo (MC) or molecular dynamics (MD) algorithms in canonical ensemble will 
necessarily get trapped in one of many local-minimum states in the energy function. In 
order to overcome this multiple-minima problem, many methods have been proposed (for 
a review, see, e.g., Ref. [l|]). 
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One way to alleviate the difficulty is to perform a simulation in a generalized ensemble 
where each state is weighted by a non-Boltzmann probability weight factor so that a 
random walk in potential energy space may be realized. The random walk allows the 
simulation to escape from any energy barrier and to sample much wider configurational 
space than by conventional methods. Monitoring the energy in a single simulation run, 
one can obtain not only the global-minimum-energy state but also canonical ensemble 
averages as functions of temperature by the single-histogram and multiple-histogram 
Q reweighting techniques. 

One of the most well-known generalized-ensemble methods is perhaps multicanonical 
algorithm (MUCA) 0] (for a recent review, see Ref. [§]). MUCA was first introduced to 
the molecular simulation field in Ref. || . Since then MUCA has been extensively used in 
many applications in protein and related systems (for a review, see, e.g., Ref. 0). 

While a simulation in multicanonical ensemble performs a free ID random walk in 
potential energy space, that in simulated tempering (ST) || performs a free random walk 
in temperature space (for a review, see, e.g., Ref. f9|). This random walk, in turn, induces 
a random walk in potential energy space and allows the simulation to escape from states of 
energy local minima. ST has also been introduced to the protein folding problem jn| |Tl| . 

The generalized-ensemble method is powerful, but in the above two methods the prob- 
ability weight factors are not a priori known and have to be determined by iterations of 
short trial simulations. This process can be non-trivial and very tedius for complex sys- 
tems with many local-minimum-energy states. Therefore, there have been attempts to 



accelerate the convergence of the iterative process for MUCA |T_2], [131 ( see a l so ^- e ^ E 

In the replica- exchange method (REM) ||15|| , the difficulty of weight factor determina- 
tion is greatly alleviated. (REM is also referred to as multiple Markov chain method [1 



and parallel tempering 11. For recent reviews with detailed references about the method, 



see, e.g., Refs. [[171 , |T8||.) In this method, a number of non-interacting copies (or replicas) 
of the original system at different temperatures are simulated independently and simul- 
taneously by the conventional MC or MD method. Every few steps, pairs of replicas are 
exchanged with a specified transition probability. REM has also been introduced to the 
protein folding problem [pi| p0| . We further developed a multidimensional REM which is 
particularly useful in free energy calculations |2T| . 

However, REM also has a computational difficulty: As the number of degrees of 
freedom of the system increases, the required number of replicas also greatly increases, 
whereas only a single replica is simulated in MUCA or ST. This demands a lot of com- 
puter power for complex systems. Our solution to this problem is: Use REM for the 
weight factor determinations of MUCA or ST, which is much simpler than previous iter- 
ative methods of weight determinations, and then perform a long MUCA or ST produc- 
tion run. The methods are referred to as the replica- exchange multicanonical algorithm 
(REMUCA) and the replica- exchange simulated tempering (REST) [p3|| . We have 
introduced a further extension of REMUCA, which we refer to as multicanonical replica- 
exchange method (MUCAREM) |22|. In MUCAREM, the multicanonical weight factor is 
first determined as in REMUCA, and then a replica-exchange multicanonical production 
simulation is performed with a small number of replicas (for a review of all these new 
methods, see Ref. fll7|). 

In this article, we describe the six generalized-ensemble algorithms mentioned above. 
Namely, we first describe the three familiar methods: MUCA, ST, and REM. We then 
present the three new algorithms: REMUCA, REST, and MUCAREM. The effectiveness 
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of these methods is tested with a 2- dimensional Potts model and protein systems. 



2 METHODS 

In the regular canonical ensemble with a given inverse temperature j3 = 1/ksT (ks is the 
Boltzmann constant), the probability distribution of potential energy E is given by 

P B (E;T) oc n(E) W B {E-T) = n{E) e^ E , (1) 

where n(E) is the density of states. Since the density of states n(E) is a rapidly increasing 
function of E and the Boltzmann factor Wb{E;T) decreases exponentially with E, the 
probability distribution Pb{E; T) has a bell-like shape in general. However, it is very dif- 
ficult to obtain canonical distributions at low temperatures with conventional simulation 
methods. This is because the thermal fluctuations at low temperatures are small and the 
simulation will certainly get trapped in states of energy local minima. 

Multicanonical algorithm (MUCA) Q is one of the most well-known generalized- 
ensemble algorithms. In the "multicanonical ensemble" the probability distribution of 
potential energy is defined as follows: 

P mu {E) oc n(E) W mu {E) = constant . (2) 

Because the multicanonical weight factor W mu (E) is (proportional to the inverse of the 
density of states and) not a priori known, one has to determine it for each system by 
iterations of trial simulations. See, for instance, Ref. || for details of the method to 
determine the MUCA weight factor W mu (E). 

After the optimal MUCA weight factor is obtained, one performs a long MUCA simu- 
lation once. By monitoring the potential energy throughout the simulation, one can find 
the global- minimum-energy state. Moreover, by using the obtained histogram N mn (E) of 
the potential energy distribution P ma (E), the expectation value of a physical quantity A 
at any temperature T = 1 / k^fi can be calculated from 

£ A(E) n(E) e~ pE 

<A>T= \n(E)^ ■ (3) 

E 

where the best estimate of the density of states is given by the single-histogram reweighting 
techniques (see Eq. (Q)) [||: 

"<^W (4) 

In simulated tempering (ST) || temperature itself becomes a dynamical variable, and 
both the configuration and the temperature are updated during the simulation with a 
weight: 

W ST (E;T m ) = e-^ E+a - , (5) 

where we discretize the temperature in M different values, T m (m = 1, ■ ■ • , M). Without 
loss of generality we can order the temperature so that T\ < T 2 < ■ ■ • < Tm- The lowest 
temperature T\ should be sufficiently low so that the simulation can explore the global- 
minimum-energy region, and the highest temperature Tm should be sufficiently high so 
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that no trapping in a local-minimum-energy state occurs. The parameters a m are chosen 
so that the probability distribution of temperature is flat: 

^ST(T m ) = JdE n(E) W ST (E; T m ) = J dE n(E) e~ PmE+am = constant . (6) 

Hence, in simulated tempering the temperature is sampled uniformly. A free random walk 
in temperature space is realized, which in turn induces a random walk in potential energy 
space and allows the simulation to escape from states of energy local minima. 

The parameters a m are not known a priori and have to be determined by iterations of 



short simulations. See, for instance, Ref. | ] for details of the method to determine the 



ST weight factor WsT(E;T mJ 

Note that from Eq. (0) we have 



e~ am oc J dE n(E) e~ PmE . (7) 

The parameters a m are therefore "dimensionless" Helmholtz free energy at temperature 
T m (i.e., the inverse temperature j3 m multiplied by the Helmholtz free energy). 

A simulation of ST is realized by alternately performing the following two steps ||. 
Step 1: A canonical MC or MD simulation at the fixed temperature T m is carried out for 
a certain MC or MD steps. Step 2: The temperature T m is updated to the neighboring 
values T m ±i with the configuration fixed. The transition probability of this temperature- 
updating process is given by the Metropolis criterion (see Eq. (|J)): 

w(T rn -► T m± i) = min(l, e" A ) , (8) 

where 

A = (Anil ~ An) E ~ (°W±1 - Om) • (9) 

After the optimal ST weight factor is determined, one performs a long ST simulation 
once. From the results of this production run, one can obtain the canonical ensemble 
average of a physical quantity A as a function of temperature from Eq. (|3[), where the 
density of states is given by the multiple-histogram reweighting techniques || as follows. 
Let N m (E) and n m be respectively the potential-energy histogram and the total number 
of samples obtained at temperature T m = 1/ fee An (m — 1, ■ ■ ■ , M). The best estimate of 
the density of states is then given by ||] 

M 

£ g- 1 N m (E) 
<E) = , (10) 



^ ] 9m 



I 1™ O 



m=l 



where 

e -/m = J- n {E) e~ PmE . (11) 

E 

Here, g m = 1 + 2r m , and r m is the integrated autocorrelation time at temperature T m . 
Note that Eqs. flTDD and (|TTJ) are solved self-consistently by iteration to obtain the 
dimensionless Helmholtz free energy f m and the density of states n(E). 
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The system for replica-exchange method (REM) |T5J consists of M non-interacting 
copies, or replicas, of the original system in canonical ensemble at M different temper- 
atures T m (m = 1, ■ • • , M). We arrange the replicas so that there is always one replica 
at each temperature. Then there is a one-to-one correspondence between replicas and 
temperatures. Let X = {• ■ • , x^j, • • -j stand for a state in this generalized ensemble. 

Here, the superscript i and the subscript m in xjj] label the replica and the temperature, 
respectively. The state X is specified by the M sets of coordinates q^. A simulation 
of REM is then realized by alternately performing the following two steps |15| (for de- 
tails of the molecular dynamics version of REM, see Ref. |20|). Step 1: Each replica 



in the canonical ensemble at a fixed temperature is simulated simultaneously and inde- 
pendently for a certain number of MC or MD steps. Step 2: A pair of replicas, say i 
and j, which are at neighboring temperatures, say T m and T m+ i, respectively, are ex- 
changed: X = {• • • , xW, • • • , • • ■} X' = {• ■ • , zjg, • • • , xJJ +1 , • • •}. The transition 
probability of this replica exchange is given by the Metropolis criterion: 

w(X -*X') =min(l,e- A ) , (12) 

where 

A ee (f3 m+1 - f3 m ) (E ( g M) - E (f)) . (13) 

The replica- exchange multicanonical algorithm (REMUCA) |E| and replica- exchange 
simulated tempering (REST) P3| overcome both the difficulties of MUCA and ST (the 
weight factor determination is non-trivial) and REM (a lot of replicas, or computation 
time, is required). 

In REMUCA |^2[ we first perform a short REM simulation (with M replicas) to 
determine the MUCA weight factor and then perform with this weight factor a regular 
MUCA simulation with high statistics. The first step is accomplished by the multiple- 
histogram reweighting techniques 0. Let N m (E) and n m be respectively the potential- 
energy histogram and the total number of samples obtained at temperature T m = l/k^f3 m 
of the REM run. The density of states n{E) is then given by solving Eqs. fllCf) and (|Xlf) 
self-consistently by iteration ||. Once the estimate of the density of states is obtained, 
the multicanonical weight factor can be directly determined from Eq. (§). 

In REST PB| , just as in REMUCA, we first perform a short REM simulation (with 



M replicas) to determine the ST weight factor and then perform with this weight factor a 
regular ST simulation with high statistics. The first step is accomplished by the multiple- 
histogram reweighting techniques [[J , which give the dimensionless Helmholtz free energy 
f m (see Eqs. ( |TDD and ([TTj) ) . Once the estimate of the dimensionless Helmholtz free energy 
f m are obtained, the simulated tempering weight factor can be directly determined by 
using Eq. @ where we set a m = f m (compare Eq. (fr|) with Eq. ([□]))• 

The formulations of REMUCA and REST are simple and straightforward, but the 
numerical improvement is great, because the weight factor determination for MUCA and 
ST becomes very difficult by the usual iterative processes for complex systems. 

While multicanonical simulations are usually based on local updates, a replica-exchange 
process can be considered to be a global update, and global updates enhance the sam- 
pling further. Here, we present a further modification of REMUCA and refer to the new 
method as multicanonical replica- exchange method (MUCAREM) |22j| . In MUCAREM 
the final production run is not a regular multicanonical simulation but a replica-exchange 
simulation with a few replicas in the multicanonical ensemble. Because multicanonical 
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simulations cover much wider energy ranges than regular canonical simulations, the num- 
ber of required replicas for the production run of MUCAREM is much less than that for 
the regular REM, and we can keep the merits of REMUCA (and improve the sampling 
further). 



T = 0.6000 

t = o nze 




B 



Figure 1: Probability distributions of energy of 2-dimensional 10-state Potts model at 
three temperatures: T = 0.6000, 0.7026, and 0.8000. The results were obtained from a 
multicanonical MC simulation. 



3 RESULTS 

We now present the results of our simulations based on the algorithms described in the 
previous section. 

The first example is a spin system. We studied the 2-dimensional 10-state Potts model 
24|j . The lattice size was 34 x 34. This system exhibits a first-order phase transition ES[ . 



In Figure 1 we show the probability distributions of energy at three tempeartures (above 
the critical temperature Tc, at Tc, and below Tc)- At the critical temperature we observe 
two peaks in the distribution, indicating that the system indeed undergoes a first-order 
phase transition. 



In Figure 2 we show how the iterative procedure [13| for the MUCA weight factor 
determination converges. We see that a flat distribution in the entire energy range was 
obtained after 960,000 MC sweeps. Note that the convergence slows down drastically near 
the global-minimum-energy region (step from 300,000 MC sweeps to 960,000 MC sweeps). 

In Figure 3 we show the results of our new method for the MUCA weight factor 
determination. We first made a REM simulation of 10,000 MC sweeps (for each replica) 
with 32 replicas (Figure 3(a)). Using the obtained energy distributions, we determined 
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Figure 2: Iterative process of multicanonical weight factor determination for the 2- 
dimensional 10-state Potts model. The results after 100,000 MC sweeps, 200,000 MC 
sweeps, 300,000 MC sweeps, and 960,000 MC sweeps are superimposed. 




Figure 3: Probability distributions of energy for the 2-dimensional 10-state Potts model: 
the results of REM simulation with 32 replicas (a) and iterations of MUCAREM simula- 
tions with 8 replicas (b), (c), and (d). 
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Figure 4: Time series (from 120,000 MC sweeps to 300,000 MC sweeps) of potential energy 
of the peptide fragment of ribonuclease Tl from a conventional canonical MC simulation 
at T = 200 K (a) and a multicanonical MC simulation (b). 

the (preliminary) MUCA weight factor by the REMUCA procedure as described in the 
previous section. Because the trials of replica exchange are not accepted near the critical 
temperature for first-order phase transitions, the probability distributions in Figure 3(a) 
for the energy range from ~ —1.5 to ~ —1.0 fails to have sufficient overlap, which is 
required for successful application of REM. This means that the MUCA weight factor, 
or density of states, in this energy range thus determined is of "poor quality." With this 
MUCA weight factor, however, we made iterations of three MUCAREM simulations of 
10,000 MC sweeps (for each replica) with 8 replicas (Figures 3(b), 3(c), 3(d)). In Figure 
3(b) we see that the distributions are not completely flat, reflecting the poor quality in 
the phase-transition region. This problem is rapidly rectified as iterations continue, and 
the distributions are completely flat in Figure 3(d), which gives an optimal MUCA weight 
factor in the entire energy range. The details including the comparisons with the new 
method in Ref. JT3J will be published elsewhere [21 . 



The second example is a protein system. We first illustrate how effectively generalized- 
ensemble simulations can sample the configurational space compared to the conventional 
simulations in the canonical ensemble. It is known by experiments that the system of a 
17-residue peptide fragment from ribonuclease Tl tends to form a-helical conformations. 
We have performed both a canonical MC simulation of this peptide at a low temperature 
(T = 200 K) and a multicanonical MC simulation [26]. In Figure 4 we show the time 
series of potential energy from these simulations. 

We see that the canonical simulation thermalizes very slowly. On the other hand, the 
MUCA simulation indeed performs a random walk in potential energy space covering a 
very wide energy range. Four conformations chosen during this period (from 120,000 MC 
sweeps to 300,000 MC sweeps) are shown in Figure 5 for the MUCA simulation. The 
MUCA simulation indeed samples a wide conformational space. 

The last example is a penta peptide, Met-enkephalin, whose amino-acid sequence 
is: Tyr-Gly-Gly-Phe-Met. In Figure 6, we show the average potential energy of Met- 
enkephalin in gas phase as a function of temperature that was calculated by the single- and 
multiple-histogram reweighting techniques from the four generalized-ensemble algorithms, 



S 




Figure 5: Typical snapshots from the multicanonical MC simulation of Figure 4(b). The 
corresponding numbers of MC sweeps are 138,000 (a), 190,000 (b), 243,000 (c), and 
295,000 (d). The figures were created with Molscript and Raster3D. 
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Figure 6: The average potential energy of Met-enkephalin in gas phase as a function of 
temperature. The results from the four generalized-ensemble algorithms, MUCA, RE- 
MUCA, MUCAREM, and REST, are superimposed. 

MUCA, REMUCA, MUCAREM, and REST J27|. The results are in good agreement. 

4 CONCLUSIONS 

In this article we have described the formulations of the three well-known generalized- 
ensemble algorithms, namely, multicanonical algorithm (MUCA), simulated tempering 
(ST), and replica-exchange method (REM). We then introduced three new generalized- 
ensemble algorithms that combine the merits of the above three methods, which we refer 
to as replica-exchange multicanonical algorithm (REMUCA), replica-exchange simulated 
tempering (REST), and multicanonical replica-exchange method (MUCAREM). 

With these new methods available, we believe that we now have working simulation 
algorithms for spin systems and protein systems. 
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